JOURNAL OF KUNMING METALLURGY COLLEGE ›› 2015, Vol. 31 ›› Issue (5): 65-69.DOI: 10. 3969/j. issn. 1009—0479.2015.05.012
Previous Articles Next Articles
LIU Xu
Received:
Online:
Published:
Abstract:
According to the current domestic corpus based study,AntConc and PowerGREP are the mainresearch tool. Few studies were done using the Python language NLTK packet for data processing and analysis. It can not provide support to the research methods due to the design defect of the software. The Python language NLTK handling package was used in the study so that the data have uniform standards avoiding the conversion of various weakness of the range tool such as types of word processing workshop trouble. It also makes up for the syntactic analysis,graphic,regular expression search etc. In this paper,it was briefly introduced that the application of NLTK processing package based on Python in research. Then it takes the novel Emma written by Austen in Gutenberg corpus as an example to explain how to use the natural language processing to process the data.
Key words: Python, NLTK toolkit, corpus research
CLC Number:
TP391. 1
LIU Xu. The Application of NLTK Toolkit Based on Python in Corpus Research[J]. JOURNAL OF KUNMING METALLURGY COLLEGE, 2015, 31(5): 65-69.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://kmyzxb.magtech.com.cn/EN/10. 3969/j. issn. 1009—0479.2015.05.012
http://kmyzxb.magtech.com.cn/EN/Y2015/V31/I5/65
[1]BIBERD,CONRADS,REPPENR.Corpuslinguistics:investigatinglanguagestructureanduse[M].Cambridge:CambridgeU niversityPress,1998. [2]BIRDS,KLEINE,LOPERE.Naturallanguageprocessingwithpython[M].NewYork:OReillyMediaPress,2009. [3]PERKINSJ.PythontextprocessingwithNLTK2.0cookbook:Liteedition[M].Birmingham:PacktPublishingLtd,2011. [4]严华,王立非.PowerGREP与语料库加工[J].外语电化教学,2010(3):57-62. [5]桂诗春.中国学习者英语语料库[M].上海:上海外语教育出版社,2003. [6]梁茂成,李文中,许家金.语料库应用教程[M].北京:外语教学与研究出版社,2010. [7]卫乃兴.语料库应用研究[M].上海:上海外语教育出版社,2005.